Collect

Extraction Methods & Tools

The following diagram illustrates how the various extraction/replication methods might fall and interact within a DSP solution:

SAP RFC & BOA RFC

RFC allows for the extraction of SAP pooled and clustered tables. The RFC extraction method calls an RFC on the SAP application server to run a SQL command. Data is returned in a Binary Large Object (BLOB) in blocks (e.g., 10,000 records) that are then parsed by using SAP RFC, BOA RFC or BODS RFC. The RFC is called again to extract the next block of records (in this example, the next 10,000 records). The process repeats until the entire table has been extracted.

Assemble

Assemble is a tool within the DSP that creates and executes packages to transfer data between systems. The tool uses an ODBC connection with a non individual-specific account with read-only access. Once a connection is established to a source, packages to refresh data are relatively easy to create and process in the DSP.  Assemble runs multiple threads and is the BackOffice Associates® preferred extraction method for all but the largest tables.  Additional configuration and tuning can be required for the largest tables that are over several million records (refer to the DSP Collect Delta Configuration image).

SSIS (SQL Server Integration Services)

A component of MS SQL Server, SSIS replaces Data Transformation Services (DTS). SSIS is typically faster than Assemble because a single program is both reading the source and loading the target. Depending on variables like hardware, connections and table width, SISS can achieve extraction speeds of several million records per minute. However, using SSIS to extract too many tables simultaneously can require additional CPUs and memory on the application server. Consequently, SSIS should be used judiciously to extract tables that cannot be efficiently extracted with other methods. The same delta configuration that can be implemented for Assemble can also be for SSIS when needed (refer to the DSP Collect Delta Configuration diagram).

DBMoto®

DBMoto® functionality (which is housed in Collect) that uses an ODBC connection to download data in any of three different ways: Refresh, Change Data Capture (or Mirroring) and Synchronization.

In Refresh mode, an entire table is extracted. A one-time full-table refresh is required for any tables that will be set up for change data capture or synchronization. Change data capture mirrors data changes made in a source system to a target system. Synchronization pushes data changes both directions between two systems. 

When performing change data capture or synchronization, DBMoto® leverages native change logs within the source to identify the subset of records that have been changed and need to be updated in the target system. If the source system does not contain native or accessible change logs, then DBMoto uses triggers on the source system to push changes.

Choose the Right Extraction Method

To aid in selecting the right extraction methods for the project, refer to the following table:

Connection Type

Extraction Method

Pros

Cons

Examples

RFC

  • BOA RFC
  • SAP RFC
  • BODS RFC
  • Ability to extract pooled tables, clustered tables and long text
  • Standard SAP protocol
  • Refresh can be scheduled within Collect
  • Slower than ODBC
  • Single thread to limit the number of jobs running at one time

Material Long Text

ODBC

Assemble

  • Multi-threaded for fast extraction of all but the largest tables (BOA preferred method)
  • Not tied to a specific version of SQL Database Management System (DBMS)
  • Supports fixed and delimited widths, and Excel.
  • Can configure batch size, select execution by field (e.g., change dates) and the number of parallel threads
  • Query can be optimized to be executed on source database
  • Very quick to initially build in the DSP
  • Refresh can be scheduled within Collect
  • For use only with transparent tables
  • Diminished performance when pulling very large record sets (> several million)
  • Material Master
  • Customer
  • Product Hierarchy
  • Check & Configuration tables

 

SSIS

  • Improved performance over Assemble for many system types. Can extract millions of records per minute, depending on hardware, software, table width, etc.
  • Filters can be applied to tables that maintain create and update dates in order to limit the data extracted. In this case, the logic and overhead to perform the delta on the full table copy is done in the DSP.
  • Refresh can be scheduled in Collect
  • For use only with transparent tables
  • Running many table extractions simultaneously with SSIS can require additional CPUs and memory
  • Requires additional configuration in DSP to set up as a delta update

Largest master data or transactional tables where Assemble extraction is not sufficient. Historical examples include:

  • Open Items
  • Bill History

 

DBMoto (Collect)

  • Can extract data through Refresh, Change Data Capturing (mirroring) or Synchronization modes
  • Change data capture and synchronization leverage a source system’s native change logs or use triggers to identify the subset of data that has changed, thereby reducing the amount of data being extracted
  • Small footprint
  • Highly configurable through GUI

If trigger-driven (no native change logs), changes to the source system are required (triggers and trigger tables for the tables that are going to be replicated with DBMoto)

Largest master data or transactional tables where Assemble extraction is not sufficient. Historical examples include:

  • Open Items
  • Bill History

Collect Delta Configuration

For large tables with requisite date fields, a delta extraction process can be built in within DSP, as conceptualized in the following diagram: